Skip to content

Conversation

@Miciah
Copy link
Contributor

@Miciah Miciah commented Oct 4, 2024

With OSSM 3, the Maistra Istio Operator is replaced with new operator based on the upstream Sail Operator, and the ServiceMeshControlPlane CRD is replaced by the Istio CRD. Vendor the sail-operator API:

go mod edit -replace github.com/imdario/mergo=github.com/imdario/[email protected]
go get github.com/istio-ecosystem/sail-operator/api/v1
go mod tidy
go mod vendor

Note that vendoring sail-operator requires the mergo override.

OSSM 3.0 is based on Istio 1.24, which supports Gateway API v1.2.1. Copy in the updated Gateway API CRDs:

curl --silent --output-dir pkg/manifests/assets/gateway-api/ --remote-name 'https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/refs/heads/main/config/crd/standard/gateway.networking.k8s.io_{gatewayclasses,gateways,grpcroutes,httproutes,referencegrants}.yaml'

Update the conformance tests to reflect the features that Istio 1.24 supports.

Update the gatewayclass controller to create a subscription for the OSSM 3 Operator (which is in a separate channel from the OSSM 2 operator) and to create an Istio CR instead of a ServiceMeshControlPlane CR.

Currently, OSSM 3.0 is Tech Preview, so the subscription is actually for the latest Sail Operator nightly build.

Update client initialization, RBAC, and tests that used the Maistra APIs from OSSM 2 to use the sail-operator API for OSSM 3.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 4, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 4, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@Miciah
Copy link
Contributor Author

Miciah commented Oct 4, 2024

/test e2e-aws-gatewayapi
/test e2e-aws-operator

@Miciah Miciah force-pushed the bump-to-OSSM-3.0-for-Gateway-API-support branch 2 times, most recently from 9af78a0 to 1ceec30 Compare October 4, 2024 21:18
@Miciah
Copy link
Contributor Author

Miciah commented Oct 4, 2024

/test e2e-aws-gatewayapi

@Miciah Miciah force-pushed the bump-to-OSSM-3.0-for-Gateway-API-support branch from 1ceec30 to 53ec66e Compare October 14, 2024 04:30
@Miciah
Copy link
Contributor Author

Miciah commented Oct 14, 2024

/test e2e-aws-gatewayapi

@candita
Copy link
Contributor

candita commented Oct 25, 2024

@Miciah Just fyi - this PR doesn't update the CRDs so this is still running v1beta1 Gateway API CRDs.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2025
@candita
Copy link
Contributor

candita commented Mar 5, 2025

/assign @candita, @alebedev87

@candita
Copy link
Contributor

candita commented Mar 5, 2025

/assign @candita @alebedev87

@Miciah Miciah force-pushed the bump-to-OSSM-3.0-for-Gateway-API-support branch from 53ec66e to fbba0d2 Compare March 6, 2025 18:00
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2025
@Miciah
Copy link
Contributor Author

Miciah commented Mar 6, 2025

/test e2e-aws-gatewayapi
/test e2e-aws-operator

Comment on lines 58 to 63
Channel: "stable",
Channel: "1.1-nightly", //"stable",
InstallPlanApproval: operatorsv1alpha1.ApprovalAutomatic,
Package: "servicemeshoperator",
CatalogSource: "redhat-operators",
Package: "sailoperator", //"servicemeshoperator3",
CatalogSource: "community-operators", //"redhat-operators",
CatalogSourceNamespace: "openshift-marketplace",
StartingCSV: "sailoperator.v1.1.0-nightly-2025-03-05",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OSSM 3.0.0TP2 still has the Istio v1alpha1 CRD, so I am using a Sail Operator nightly to get the Istio v1 CRD. I will update this subscription once OSSM 3.0 is GA.

@Miciah Miciah force-pushed the bump-to-OSSM-3.0-for-Gateway-API-support branch from fbba0d2 to 3e738c1 Compare March 6, 2025 20:36
@Miciah
Copy link
Contributor Author

Miciah commented Mar 6, 2025

I forgot to git add a couple of new files. https://github.com/openshift/cluster-ingress-operator/compare/fbba0d2acaa26ba4fea21233ba472426885945e7..3e738c1b36c922bcdac731effce285edbe8050d0 adds the missing files.

/test e2e-aws-gatewayapi
/test e2e-aws-operator

@Miciah Miciah force-pushed the bump-to-OSSM-3.0-for-Gateway-API-support branch from 3e738c1 to 4e54863 Compare March 6, 2025 22:43
@Miciah
Copy link
Contributor Author

Miciah commented Mar 6, 2025

/test e2e-aws-gatewayapi

@Miciah
Copy link
Contributor Author

Miciah commented Mar 6, 2025

@Miciah
Copy link
Contributor Author

Miciah commented Mar 7, 2025

I believe that the tests are failing because automated deployment/sidecar injection is not working with Sail Operator.

@Miciah
Copy link
Contributor Author

Miciah commented Mar 7, 2025

/test e2e-aws-gatewayapi

@Miciah
Copy link
Contributor Author

Miciah commented Mar 7, 2025

e2e-aws-gatewayapi failed because Step e2e-aws-gatewayapi-aws-deprovision-users-and-policies failed.

/test e2e-aws-gatewayapi

@Miciah Miciah force-pushed the bump-to-OSSM-3.0-for-Gateway-API-support branch from e76ab75 to f5b8619 Compare March 8, 2025 00:44
@guicassolato
Copy link

You may be able to ditch that replace directive on github.com/imdario/mergo (as well as the dep altogether) if you can bump your k8s.io/* deps to v0.32.2+, k8s.io/client-go in particular.

Ref.: kubernetes/kubernetes#126764

Miciah added 2 commits March 21, 2025 00:39
When reconciling gatewayclasses in order to ensure the Istio CR, make sure to
use the oldest gatewayclass that has our controller name.

Before this commit, the controller behaved inconsistently:

- The controller's watch on gatewayclasses reconciled the specific gatewayclass
  that triggered the watch.

- The controller's watches on subscriptions, installplans, and istios reconciled
  the "openshift-default" gatewayclass.

This inconsistency caused several problems:

- If a gatewayclass existed with our controller name but none existed with the
  name "openshift-default", the controller logged spurious reconcile requests
  for, and failures to get, the "openshift-default" gatewayclass.

- If multiple gatewayclasses existed with our controller name, the controller
  tried to update the owner reference on the Istio CR every time the controller
  reconciled a gatewayclass other than the one in the owner reference.

Using the oldest gatewayclass solves these problems.

* pkg/operator/controller/gatewayclass/controller.go
(enqueueRequestForDefaultGatewayClassController): Rename...
(enqueueRequestForSomeGatewayClass): ...to this.  Delete the unused namespace
parameter.  Look up the oldest gatewayclass with our controller name, and return
a reconcile request for that gatewayclass.
(NewUnmanaged, Reconcile): Use enqueueRequestForSomeGatewayClass for all
watches.
Shorten the name of the gatewayclass that the conformance tests use in order to
avoid errors creating the service, which is named based on the gateway's and
gatewayclasses's names:

    Invalid value: "http-listener-isolation-with-hostname-intersection-gateway-conformance": must be no more than 63 characters

This failure is from the GatewayHTTPListenerIsolation test.  We do not currently
run this test by default.  This commit shortens the gatewayclass name in
preparation for enabling the test in the future.

* hack/gatewayapi-conformance.sh: Shorten the gatewayclass name.
@Miciah Miciah force-pushed the bump-to-OSSM-3.0-for-Gateway-API-support branch from 8055a97 to 25dde98 Compare March 21, 2025 04:42
@Miciah
Copy link
Contributor Author

Miciah commented Mar 21, 2025

https://github.com/openshift/cluster-ingress-operator/compare/8055a976678a3377646399d0110f8cbe4d0e9144..25dde982b0f91fe5bed88fc44da904d920f03d2a addresses a couple of review comments that I missed before:

  • Amend the comment for IngressOperatorOwnedAnnotation to say that this annotation is currently only intended for newly created subscriptions. I also split this change out into a separate commit as it isn't strictly related to the OSSM 3 bump.
  • Improve the E2E test log message when deleting an Istio and subsequently trying to get it results in an unexpected error.

/test e2e-aws-gatewayapi-conformance

Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the follow-ups that were already created, I think things are looking good here.

git clone --branch "${BRANCH}" https://github.com/kubernetes-sigs/gateway-api
cd gateway-api

if [[ "$BUNDLE_VERSION" = "v1.2.1" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically you're going to want to cover anything in v1.2.x. Hopefully we don't have any more patches in v1.2, but you never know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't seen that the proposed backport at kubernetes-sigs/gateway-api#3688 had been closed.

However, I'm ambivalent about changing this, knowing that there's still a chance that there could be a v1.2.2 with the patch, or a v1.2.2 with some other change that causes the cherry-pick to fail.

If we bump to v1.2.2+ and find that the cherry-pick is still needed, we can update this test then with the knowledge that it remains applicable and necessary.

cd gateway-api

if [[ "$BUNDLE_VERSION" = "v1.2.1" ]]; then
echo "Cherry-picking fix for CoreDNS deployment issue in Gateway API v1.2.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "Cherry-picking fix for CoreDNS deployment issue in Gateway API v1.2.1"
// This is a temporary workaround until Gateway API v1.3.x.
// See: https://github.com/kubernetes-sigs/gateway-api/pull/3389
echo "Cherry-picking fix for CoreDNS deployment issue in Gateway API v1.2.1"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I push another change, I'll include this comment. Otherwise, I'd rather not burn the CI cycles. It's easy enough to look up the PR from the commit.

Comment on lines -53 to +63
go test ./conformance -v -timeout 10m -run TestConformance -args --supported-features=${SUPPORTED_FEATURES}
go test ./conformance -v -timeout 10m -run TestConformance -args "--supported-features=${SUPPORTED_FEATURES}" "--gateway-class=${GATEWAYCLASS_NAME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THERE we go 👍

Comment on lines +77 to +91
for i := range gateways.Items {
if gateways.Items[i].Labels[key] == value {
continue
}
if string(gateways.Items[i].Spec.GatewayClassName) != o.GetName() {
continue
}
request := reconcile.Request{
NamespacedName: types.NamespacedName{
Namespace: gateways.Items[i].Namespace,
Name: gateways.Items[i].Name,
},
}
requests = append(requests, request)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
for i := range gateways.Items {
if gateways.Items[i].Labels[key] == value {
continue
}
if string(gateways.Items[i].Spec.GatewayClassName) != o.GetName() {
continue
}
request := reconcile.Request{
NamespacedName: types.NamespacedName{
Namespace: gateways.Items[i].Namespace,
Name: gateways.Items[i].Name,
},
}
requests = append(requests, request)
}
for _, gateway := range gateways.Items {
if gateway.Labels[key] == value {
continue
}
if string(gateway.Spec.GatewayClassName) != o.GetName() {
continue
}
request := reconcile.Request{
NamespacedName: types.NamespacedName{
Namespace: gateway.Namespace,
Name: gateway.Name,
},
}
requests = append(requests, request)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been tending towards using the index to avoid copying large values in the case of struct types, especially in hot loops or with large slices. For example, using the index can yield a significant performance improvement when iterating over thousands of routes. For the code in question, this might not be a particularly hot loop, and there probably aren't thousands or even hundreds of gateways, but I still prefer not to copy the gateway if we only need a few of its fields.

@candita
Copy link
Contributor

candita commented Mar 21, 2025

Hopefully this was a transient error.

gateway_api_test.go:147: failed to find expected Istio operator: error finding deployment openshift-operators/servicemesh-operator3: context deadline exceeded

/test e2e-aws-gatewayapi

@candita
Copy link
Contributor

candita commented Mar 21, 2025

/retest

@candita
Copy link
Contributor

candita commented Mar 22, 2025

/retest-required

@melvinjoseph86
Copy link

/retest

@Miciah
Copy link
Contributor Author

Miciah commented Mar 24, 2025

/retest-required

1 similar comment
@Miciah
Copy link
Contributor Author

Miciah commented Mar 24, 2025

/retest-required

@Miciah
Copy link
Contributor Author

Miciah commented Mar 24, 2025

An infrastructure issue (error creating role assignments because we hit a limit) is causing at least the hypershift-e2e-aks failures. I'm not sure about e2e-hypershift. The issue is being mitigated (by deleting some old role assignments).

/retest-required

@Miciah
Copy link
Contributor Author

Miciah commented Mar 24, 2025

I believe the e2e-hypershift test failures were caused by OCPBUGS-53462.

The hypershift-e2e-aks job failed again. This time I don't see errors about role assignments, but TestNodePool/HostedCluster0/Main/TestNTOMachineConfigGetsRolledOut failed. The hypershift-e2e-aks job passed on #1204, which includes all the commits form this PR, and I found OCPBUGS-43824, so it seems that the test is flaky.

/test hypershift-e2e-aks

@candita
Copy link
Contributor

candita commented Mar 24, 2025

/approve

@candita
Copy link
Contributor

candita commented Mar 24, 2025

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2025
@candita
Copy link
Contributor

candita commented Mar 24, 2025

Sorry, I'm seeing things.
/unhold
/approve

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@candita
Copy link
Contributor

candita commented Mar 24, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 24, 2025
@lihongan
Copy link
Contributor

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Mar 25, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 25, 2025

@Miciah: This pull request references NE-1934 which is a valid jira issue.

In response to this:

With OSSM 3, the Maistra Istio Operator is replaced with new operator based on the upstream Sail Operator, and the ServiceMeshControlPlane CRD is replaced by the Istio CRD. Vendor the sail-operator API:

go mod edit -replace github.com/imdario/mergo=github.com/imdario/[email protected]
go get github.com/istio-ecosystem/sail-operator/api/v1
go mod tidy
go mod vendor

Note that vendoring sail-operator requires the mergo override.

OSSM 3.0 is based on Istio 1.24, which supports Gateway API v1.2.1. Copy in the updated Gateway API CRDs:

curl --silent --output-dir pkg/manifests/assets/gateway-api/ --remote-name 'https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/refs/heads/main/config/crd/standard/gateway.networking.k8s.io_{gatewayclasses,gateways,grpcroutes,httproutes,referencegrants}.yaml'

Update the conformance tests to reflect the features that Istio 1.24 supports.

Update the gatewayclass controller to create a subscription for the OSSM 3 Operator (which is in a separate channel from the OSSM 2 operator) and to create an Istio CR instead of a ServiceMeshControlPlane CR.

Currently, OSSM 3.0 is Tech Preview, so the subscription is actually for the latest Sail Operator nightly build.

Update client initialization, RBAC, and tests that used the Maistra APIs from OSSM 2 to use the sail-operator API for OSSM 3.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Miciah
Copy link
Contributor Author

Miciah commented Mar 25, 2025

/label acknowledge-critical-fixes-only

Per TRT, "Feature gated code is free to use this label."

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Mar 25, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 25, 2025

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 1b72b75 into openshift:master Mar 25, 2025
20 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-ingress-operator
This PR has been included in build ose-cluster-ingress-operator-container-v4.19.0-202503251208.p0.g1b72b75.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.